Oblivious vs. Distribution-Based Sorting: An Experimental Evaluation
نویسندگان
چکیده
We compare two algorithms for sorting out-of-core data on a distributed-memory cluster. One algorithm, Csort, is a 3-pass oblivious algorithm. The other, Dsort, makes three passes over the data and is based on the paradigm of distribution-based algorithms. In the context of out-of-core sorting, this study is the first comparison between the paradigms of distribution-based and oblivious algorithms. Dsort avoids two of the four steps of a typical distribution-based algorithm by making simplifying assumptions about the distribution of the input keys. Csort makes no assumptions about the keys. Despite the simplifying assumptions, the I/O and communication patterns of Dsort depend heavily on the exact sequence of input keys. Csort, on the other hand, takes advantage of predetermined I/O and communication patterns, governed entirely by the input size in order to overlap computation, communication, and I/O. Experimental evidence shows that, even on inputs that followed Dsort’s simplifying assumptions, Csort fared well. The running time of Dsort showed great variation across five input cases, whereas Csort sorted all of them in approximately the same amount of time. In fact, Dsort ran significantly faster than Csort in just one out of the five input cases: the one that was the most unrealistically skewed in favor of Dsort. A more robust implementation of Dsort—one without the simplifying assumptions—would run even slower.
منابع مشابه
Oblivious Sorting of Secret-Shared Data
In this research report we give an overview of methods for obliviously sorting data that has been protected using secret sharing. Oblivious sorting is an important primitive in privacypreserving data analysis, as it can impose order on secret-shared data, simplifying the construction of data transformation and aggregation algorithms. In this work, we compare several published sorting methods wi...
متن کاملCache-Aware and Cache-Oblivious Adaptive Sorting
Two new adaptive sorting algorithms are introduced which perform an optimal number of comparisons with respect to the number of inversions in the input. The first algorithm is based on a new linear time reduction to (non-adaptive) sorting. The second algorithm is based on a new division protocol for the GenericSort algorithm by Estivill-Castro and Wood. From both algorithms we derive I/O-optima...
متن کاملModels of Computation External Memory, Cache-Oblivious, and Multi-Core Algorithms
1 External Memory Algorithms 2 1.1 Surveys and Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Own Papers on the Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 The Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 The Parallel Disk Model of Aggarwal/Vitter . . . . . . . . . . ...
متن کاملFunnel Heap - A Cache Oblivious Priority Queue
The cache oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory model. Arge et al. recently presented the first optimal cache oblivious priority queue, and demonstr...
متن کاملCache-Oblivious and Data-Oblivious Sorting and Applications
Although external-memory sorting has been a classical algorithms abstraction and has been heavily studied in the literature, perhaps somewhat surprisingly, when data-obliviousness is a requirement, even very rudimentary questions remain open. Prior to our work, it is not even known how to construct a comparison-based, external-memory oblivious sorting algorithm that is optimal in IO-cost. We ma...
متن کامل